Spatio-temporal modelling of referrals to outpatient respiratory clinics in the integrated care system of the Morecambe Bay area, England

Background Promoting integrated care is a key goal of the NHS Long Term Plan to improve population respiratory health, yet there is limited data-driven evidence of its effectiveness. The Morecambe Bay Respiratory Network is an integrated care initiative operating in the North-West of England since 2017. A key target area has been reducing referrals to outpatient respiratory clinics by upskilling primary care teams. This study aims to explore space-time patterns in referrals from general practice in the Morecambe Bay area to evaluate the impact of the initiative. Methods Data on referrals to outpatient clinics and chronic respiratory disease patient counts between 2012-2020 were obtained from the Morecambe Bay Community Data Warehouse, a large store of routinely collected healthcare data. For analysis, the data is aggregated by year and small area geography. The methodology comprises of two parts. The first explores the issues that can arise when using routinely collected primary care data for space-time analysis and applies spatio-temporal conditional autoregressive modelling to adjust for data complexities. The second part models the rate of outpatient referral via a Poisson generalised linear mixed model that adjusts for changes in demographic factors and number of respiratory disease patients. Results The first year of the Morecambe Bay Respiratory Network was not associated with a significant difference in referral rate. However, the second and third years saw significant reductions in areas that had received intervention, with full intervention associated with a 31.8% (95% CI 17.0-43.9) and 40.5% (95% CI 27.5-50.9) decrease in referral rate in 2018 and 2019, respectively. Conclusions Routinely collected data can be used to robustly evaluate key outcome measures of integrated care. The results demonstrate that effective integrated care has real potential to ease the burden on respiratory outpatient services by reducing the need for an onward referral. This is of great relevance given the current pressure on outpatient services globally, particularly long waiting lists following the COVID-19 pandemic and the need for more innovative models of care. Supplementary Information The online version contains supplementary material available at 10.1186/s12913-024-10716-7.

Section 1 provides further detail of the steps that were taken when constructing the analysis data set from the Community Data Warehouse (CDW).Section 2 contains additional information of the spatio-temporal general practice (GP) registration model that was omitted from the main article as this is not the primary focus of the research.Finally, Section 3 contains exploratory data analysis, Markov Chain Monte Carlo (MCMC) methodology, and diagnosis, to supplement the information given in the main article.

Primary data source
The construction of the population data set from the CDW is outlined in the Methods section of the main article.Here we provide additional detail of decisions made, for the sake of transparency: • Duplicate NHS Numbers (most commonly caused by an individual being registered at more than one GP) with agreeing sex, date of birth, and date of death (if applicable), are assumed to be the same person, and the record with the most recent GP registration start date is taken as their current GP practice and address.If there is discrepancy then all records with the given NHS number are excluded.
• In order to be counted in a particular year, an individual's entry date must be prior to the half way point (1 st October) of the given year and their end date after the halfway point.This was done to avoid overestimating the population count, particularly in areas with highly transient populations.
• For a given GP registration in the CDW, only the individual's current address, rather than entire address history, is recorded.A change in address can only be identified if it is accompanied by a change in registered GP.Consequently, it is possible to observe "large" moves in people, but not "smaller" local moves.This limitation is exacerbated by several GP practices in the Morecambe Bay Clinical Commissioning Group (MBCCG) being made up of multiple sites.For example, Lancaster Medical Practice is comprised of eight separate sites spread over central Lancaster, hence an individual could move multiple times living in varied areas, demographically speaking, whilst remaining with the same GP.In addition, since GP registration end date is missing, it is not possible to determine whether there are breaks between registrations, for example if an individual has moved out of the MBCCG then moved back in at a later date.Therefore, we only consider the most recent registration for each distinct individual.Although this does waste some information, given the relatively short length of the study period it should only impact majorly on areas with transient populations which will have a spatial correlation.
• "Regular" registrations only are considered.In England, a "temporary" GP registration can be used whilst away from home for work, study, or on holiday, for up to three months.Individuals with a temporary registration remain registered with their permanent GP surgery during this time.

GP registration model
The primary focus of the main article is referrals to outpatient respiratory clinics and the impact of the MBRN, hence minimal focus is given to the extensive work carried out to model the GPregistered population.In this section we provide extended detail of the model, including motivation, exploratory analysis, methodology, and results.Some information from the main article has been duplicated for the sake of clarity and completeness.

Model motivation
For our study into referrals to outpatient respiratory clinics, it is crucial to adjust for patient burden to avoid producing a model that simply acts as a proxy for chronic respiratory disease (CRD) prevalence.

Exploratory analysis
The outcome variable, denoted P N HS it , is the number of adults ≥ 25 years registered at an MBCCG GP for LSOA i = 1, . . ., N and year t = 1, . . ., T + 1, calculated from NHS Digital.Our study of outpatient referrals is over T = 8 study years (2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019), yet for this model we utilise data from the year 1 st April 2020 -31 st March 2021.We recognise that including the 2020/21 year may introduce COVID-related bias (e.g., people may have been less likely to move house or relocate during the pandemic which would in turn affect GP registration) but it has been included to increase the sample size and improve the model's prediction capacity.
We first consider a generalised linear model (GLM).The outcome counts are sufficiently large (mean = 1181, minimum = 681) to justify a log-Gaussian model as an approximation to the Poisson.The natural logarithm of CDW population count, from which we aim to predict the outcome variable for the unobserved years, is included as a covariate (log(CDW)).Additional covariates are included to adjust for known sources of error, namely year (Year) and proportion of the population registered at a GP not included in the CDW data sharing agreement (Missing).Table S1 where x it is the vector of explanatory variables, β the corresponding regression parameters, and σ 2 the variance of the residual errors.The model captures the spatio-temporal autocorrelation by assigning the random effects a spatio-temporal extension of conditional autoregressive (CAR) priors, which are a type of Gaussian Markov random field (GMRF).Here we follow the model proposed by Rushworth et al. [1], The S T +1 is specified marginally since S T +2 does not exist.We condition in the reverse order since the unobserved data is the furthest back in time as opposed to a future event which is more typically seen in prediction modelling.
The random effects specified in (1) are non-separable in space and time.Temporal correlation is modelled in the conditional expectation via a first-order autoregressive process with dependency parameter ρ T , whereas the spatial autocorrelation is induced via the precision matrix, Q. Numerous specifications for the precision matrix have been proposed in the CAR literature, but here we use that proposed by Leroux et al. [2], Q(ρ S , W ) = ρ S W +(1−ρ S )I where ρ S is the spatial dependency parameter, I the N × N identity matrix, and W an N × N neighbourhood matrix defined for the 204 non-overlapping spatial units that comprise the lattice data for this study.Using the notation i ∼ j to mean "areas i and j share a common border" and n i to be the total number of neighbours for area i, the individual elements of W are: Thus the precision matrix is a weighted average of the spatially dependent and independent structures, accommodating both weak and strong spatial autocorrelation [3].The univariate full conditional distribution better illustrates the spatial relationship, and is given by, The subscript notation −k is used to mean "all elements except k", and so S −i,t is the vector of all random effects at time point t excluding area i.

MCMC methodology
Model fit was carried out by sampling from the posterior distribution of the parameters using MCMC methodology.It is assumed the reader has an understanding of the fundamentals of Bayesian inference and MCMC methods.

Posterior distribution:
Let θ = (β, S, τ 2 , σ 2 , ρ T , ρ S ) be the vector of parameters to be estimated.The joint posterior distribution is: In the above we define S T +2 = 0 to simplify the posterior equation and write π(τ 2 , σ 2 , ρ T , ρ S , β) as shorthand for the prior distributions.

Updating algorithms:
The parameters (τ 2 , σ 2 , ρ T ) were updated via separate Gibbs samplers according to their full conditional posterior distributions: where the parameters of the truncated normal distribution respectively correspond to the mean, variance, minimum value, and maximum value.
The parameter ρ S was updated via a random walk Metropolis step.The tuning parameter was tuned to achieve an acceptance rate between 0.4 and 0.45.
The regression parameters, β, and latent variables, S, were updated jointly using Gaussian Markov random field (GMRF) full conditional sampling techniques outlined in Chapter 2 of Rue and Held (2005) [4].To summarise, we define ϕ = (β, S) and so, where A is the design matrix and V the precision matrix for ϕ.We omit the full specification of V −1 .To ensure V is singular, we enforce the linear constraint i ϕ i = 0. Sparse matrix methods combined with the algorithm for sampling from GMRFs under a linear constraint outlined on page 38 of Rue and Held (2005) were used to improve computational efficiency [4].

Inference
Inference was based on 2,000 independent samples obtained from 250,000 iterations of the algorithm with a burn-in of 50,000 and the remaining 200,000 thinned by a factor of 100 to remove any remaining autocorrelation.Convergence of the MCMC algorithm was established by the Gelman-Rubin convergence diagnostic calculated on three shorter chains to select a suitable length for the burn-in period.Trace plots, density curves, auto-correlation plots, and effective sample size (ESS) calculations were used to assess sufficient mixing of the final chain.The independent variables were standardised prior to model fit to reduce multicollinearity.
The unobserved data for years 2012 and 2013 (corresponding to years t = 1, 2) are treated as missing values in the response vector and are estimated each iteration of the MCMC algorithm according to the posterior predictive distribution: where 0 has been used to denote the current values of the parameters at a given iteration.

Results
In preliminary runs of the MCMC algorithm, the temporal dependency parameter, ρ T , was close to 1 (0.989), suggesting very strong temporal autocorrelation in the error process between NHS Digital data and the CDW.The model was repeated with ρ T fixed at 1 which represents perfect temporal auotcorrelation.The Deviance Information Criterion (DIC), an indicator of model fit for hierarchical models, was equivalent (DIC=-7299) for the models with fixed and variable temporal dependency.Hence for sake of parsimony, we proceeded with the temporal parameter fixed at 1.

Model output
Table S3 shows the results of the model fit.Figure S1 is a boxplot of the LSOA-level true GP-registered population; observed years are shaded in grey and the predicted years shaded in green.The predicted years are very similar to that of 2014.This is supported by census data over the same time period which shows a plateau in the total adult population between 2012-2014 (Figure S2).Although the census population is not identical to the GP-registered population, it is still a good indicator of overall trends [5].The predictive performance of the spatio-temporal model was by leave-one-out cross validation on the observed data (2014-2020).The mean absolute percentage error (MAPE) was calculated for the predictions as well as the percentage of LSOA-level true values within the 95% credible intervals for the corresponding prediction (Figure S4).The model predicts well for years 2014-2019, with a maximum of three LSOAs falling outside the 95% CI.The year 2020 performs considerably worse compared to the others, likely due to impact of the COVID-19 lockdown on movement of people and GP registration behaviour.S3, and the ESS for the latent variables had a median of 2,000 and a minimum of 1,104.0 500 1000 1500 2000 6.08 6.10 6.12 6.14 6.16 iteration b0 6.06 6.08 6.10 6.12 6.14 6.16

Outpatient referrals model
This section supplements the main article by providing additional details of the generalised linear mixed model (GLMM) for outpatient referrals.We first present exploratory data analysis results before describing the specifics of the MCMC algorithm used for model fit.The material is assumed to be read in conjunction with the main article.

Exploratory data analysis
Table S5 provides a summary of the variables used in the referrals model, including data source, general description, and additional notes.

Response variable
Figure S6 shows the time trend in the raw referral counts data by MBRN intervention status.
For the sake of this figure, we dichotomiose the MBRN covariate so that an LSOA is classed as "MBRN" if the proportion of the population registered at an MBRN GP ≥ 50% and "Non-MBRN" otherwise.Prior to the initiation of the MBRN, the time trends are quite similar between the two groups.Post-initiation, the MBRN areas show a dramatic decrease in annual number of referrals whilst non-MBRN areas continue in an upward trend.However, the patterns in this plot do not account for population growth or changes in the demographic or health structure of the populations.

Explanatory variables
Table S6 provides summary statistics of the covariates used in the final model for referrals to outpatient respiratory clinics.S6), yet 84% of the data points are either less than 1% or greater than 99%, with an overall median of 51.5% (S7).Lancaster is the only area that has had widespread full coverage (Figure S8), this is because all GPs in this area are part of larger, multi-site practices.The LSOAs with 0% coverage account for a greater proportion of the MBCCG spatially speaking, due to the differences in population density (illustrated by the sizes of the LSOAs), and yet phase 1 of the MBRN reached 50% of the total MBCCG population.

Overdispersion
We first considered a Poisson GLM to model referrals, with covariates included as described in the main article.The model was overdispersed (mean(Y it ) = 5.5 < 10.0 = var(Y it ); residual deviance = 2092 > 1707 = q(0.95,df = 1612) and random effects models were next considered.An analysis of the residuals did not suggest any significant spatial or temporal correlation.Moran's I statistic was insignificant for study years 2012-2017 and suggested only a weak spatial correlation at 5% significance level for 2018 and 2019 (Moran's I = 0.10 and 0.15 respectively).Therefore, it was concluded that a more complex spatial correlation structure was not necessary and an independent random intercept model was used.

MCMC methodology
Prior distributions: The random effects, Z i (i = 1, . . ., N ), act as latent variables and have independent Normal(0, κ 2 ) priors, as described in the main article.For the remaining parameters, the following priors were used: where a = 1, b = 0.01, λ 2 = 1000, and I p is a p-dimensional identity matrix with p being the number of regression parameters in the referrals model.

Updating algorithms:
The parameter κ 2 was updated via a Gibbs sampler according to the following full conditional posterior distribution: The regression parameters, γ, and latent variables, Z, were updated jointly.If we let ψ = (γ, Z), the posterior distribution for ψ is: where Q is a diagonal matrix of the prior precision and B is the design matrix for ψ.The distribution in (2) does not have a tractable form but can be approximated to a GMRF using the methodology in e.g., Chapter 4 of Rue and Held (2005) [4].The GMRF approximation, which we will denote by q(.), is then used as a proposal distribution in a Metropolis-Hastings step.
We omit the full calculations, but in brief, the approximation uses a second-order Taylor's expansion of the log of the posterior in (2) about the current value of ψ, say ψ 0 .Then, where the matrix C and vector b are functions of ψ 0 : To improve the accuracy of the approximation, the expansion is repeated, with each successive expansion performed around the mean of the previous approximation i.e. the first expansion is around ψ 0 , the second expansion is around µ 1 = C −1 (ψ 0 )b(ψ 0 ), the third expansion is around µ 2 = C −1 (µ 1 )b(µ 1 ), and so on.Preliminary runs of the algorithm found five expansions to be sufficient.Once the approximation is complete, a proposed value, say ψ * , can be sampled from the GMRF according to the same methodology in 2.3.2.As with the spatio-temporal GP registration model, we impose the linear constraint i ψ i = 0 to ensure the precision matrix of the GMRF is invertible.
The acceptance probability of the Metropolis-Hastings step is, Note that the above also requires a Taylor's expansion of π(ψ * |Y, κ 2 ) around ψ * in order to evaluate q(ψ 0 |ψ * ).This is also iterated to improve the accuracy.
Finally, at each iteration of the MCMC algorithm, we randomly sample from the posterior predictions of P N HS it for study years 2012 and 2013, and update the offset term according to the correction formula in Methods: Statistical Analysis: 1. Adjusting CRD patient count of the main article.Using this method, we use the entire posterior predictive sample, as opposed to a point estimate such as the mean of the sample, and thus account for the uncertainty in the predictions.

Inference
Inference was based on 2,000 independent samples obtained from 25,000 iterations of the algorithm, with a burn-in of 5,000 and the remaining 20,000 thinned by a factor of 10.Similarly to the registration model, convergence was established by the Gelman-Rubin convergence diagnostic.Trace plots, density curves, auto-correlation plots, and ESS calculations were used to assess sufficient mixing of the chain.Continuous explanatory variables were standardised prior to model fit to reduce multicollinearity.

Figure S1 :
Figure S1: Spread of LSOA-level GP-registered adult (≥25) population from NHS Digital data for years 2014-2020 (shown in grey) and spatio-temporal model prediction results for years 2012-2013 (shown in green).

Figure S2 :
Figure S2: Total population size for the 204 study LSOAs according to ONS mid-year estimates.

Figure S4 :Figure S5 :
Figure S4: Diagnostic traceplots and density curves for the β parameters in the spatio-temporal GP registration model.
Figure S7: Histogram showing the distribution of the proportion of the popualtion registered at an MBRN GP in 2017.

Figures
Figures S9-S11 show the traceplots and density curves for the parameters in the model.Since there are 21 regression coefficients and 204 latent variables, only a subset of the plots are displayed.The ESS for κ 2 was 1,432.The ESS for the regression coefficients had a median of 2,000 and a minimum of 1,863, and the latent variables had a median of 2,000 and a minimum of 1,231.

Figure S9 :Figure S10 :Figure S11 :
Figure S9: Diagnostic traceplots and density curves for κ 2 in the random intercept outpatient referrals model.
provides further details of the variables used in analysis and TableS2provides summary statistics.The GLM is of + β 1 log(CDW it ) + β 2 Year t + β 3 Missing it .

Table S1 :
Description of variables used in spatio-temporal GP registration model.stJan, 1 st Apr, 1 st Jul, 1 st Oct) since January 2014.LSOA-level given for all-age only.GP-level given in five-year age brackets.Number 25 years or over estimated for each LSOA by multiplying the number registered at each GP by the proportion of that GPs register over 25.Estimates calculated for each quarter and averaged across study years.Two GPs not in the data sharing agreement of the CDW.An additional GP closed in September 2015 (before the CDW was created), patients had to register at a new GP so these patients are "missing" pre-September 2015.Percentage calculated using LSOA-level GP registration data released by NHS Digital.We do with calculations with all-age data and assume this variable not Only consider data for MBCCG GPs and LSOAs within the MBCCG.'Regular' registrations only.'Temporary' registrations are not counted in the NHS Digital Entry date -most recent of study start date (01/04/12), 25 th birthday, and GP registration start date.End date -earliest of date of death and GP registration end date proxy.If not relevant then 'NA'.'Regular' registrations only.'Temporay' registrations not included.to be correlated with age.As with the 'GP-registered population' variable, mean taken across quarters.For the study years 2012 and 2013, the 2014 value is used.Exploratory analysis suggests this variable does not fluctuate year-on-year.

Table S3 :
Median parameter estimates, 95% credible intervals (CI), and effective sample size (ESS) for the spatio-temporal GP-registration model based on 2,000 independent MCMC samples.

Table S4 :
Mean absolute percentage error (MAPE) and proportion of true values that are within the 95% credible interval (CI) for each year predicted.S5 show the traceplots and density curves for the parameters in the model and a subset of the latent variables.The ESS for the model parameters can be seen in Table

Table S6 :
Summary of model covariates over all space-time units.All values are percentages, except the distance variable which is in kilometres.
TableS7shows the change in mean of all time-varying covariates: both age variables show a mostly increasing trend, indicative of an ageing population and percentage male has increased marginally.The proportion of the population registered at a GP that joined the MBRN in 2017 remained

Table S5 :
Description of variables used in the random intercept model of referrals to outpatient respiratory clinics.constant between 2014-2017 before increasing in 2018.MBRN coverage is unobserved for study years 2012 and 2013 as NHS Digital did not release LSOA-level data until 2014.In the final model, we assume the 2012 and 2013 values to be equal to 2014.Note that IMD is not included in TableS7since it is not time varying. mostly

Table S7 :
Mean of time varying GLMM covariates by study year.Figures S7 and S8 both illustrate the spread of the MBRN.Since the MBRN covariate is timevarying we used 2017 data only for these plots.The MBRN covariate is defined as percentage of the total population (Table